Machine Learning-Based Methods for Enhancement of UAV-NOMA and D2D Cooperative Networks

The cooperative aerial and device-to-device (D2D) networks employing non-orthogonal multiple access (NOMA) are expected to play an essential role in next-generation wireless networks. Moreover, machine learning (ML) techniques, such as artificial neural networks (ANN), can significantly enhance network performance and efficiency in fifth-generation (5G) wireless networks and beyond. This paper studies an ANN-based unmanned aerial vehicle (UAV) placement scheme to enhance an integrated UAV-D2D NOMA cooperative network.The proposed placement scheme selection (PSS) method for integrating the UAV into the cooperative network combines supervised and unsupervised ML techniques. Specifically, a supervised classification approach is employed utilizing a two-hidden layered ANN with 63 neurons evenly distributed among the layers. The output class of the ANN is utilized to determine the appropriate unsupervised learning method—either k-means or k-medoids—to be employed. This specific ANN layout has been observed to exhibit an accuracy of 94.12%, the highest accuracy among the ANN models evaluated, making it highly recommended for accurate PSS predictions in urban locations. Furthermore, the proposed cooperative scheme allows pairs of users to be simultaneously served through NOMA from the UAV, which acts as an aerial base station. At the same time, the D2D cooperative transmission for each NOMA pair is activated to improve the overall communication quality. Comparisons with conventional orthogonal multiple access (OMA) and alternative unsupervised machine-learning based-UAV-D2D NOMA cooperative networks show that significant sum rate and spectral efficiency gains can be harvested through the proposed method under varying D2D bandwidth allocations.


Introduction
Undoubtedly, the utilization of unmanned aerial vehicles (UAVs) as UAV flying base stations (UFBSs) is of potential interest in the context of new-generation wireless communication systems. UAV-enabled wireless communication systems can provide wireless coverage extension, capacity enhancement, communication restoration during disaster events, and aerial data collection within the framework of Internet of Things (IoT) applications [1,2]. In contrast to conventional wireless communication systems that depend on fixed terrestrial infrastructures, UFBSs are dynamic and simple to deploy and reconfigure. Thus, their use introduces several degrees of freedom in terms of flexibility, wide coverage, and communication restoration during a disaster and temporary events. However, the anticipated advantages of deploying UFBSs are heavily contingent on their precise location within the region of interest to offer terrestrial users reliable and highquality communication [3].
gorithm is proposed to solve the max-min rate optimization problem, which is subjected to the constraints of the total power, available bandwidth, UAV altitude, and antenna beamwidth. The numerical results have shown that the NOMA scheme outperforms OMA, in terms of achievable rate, for different system parameters. Subsequently, the authors in [17] developed a novel NOMA UAV-assisted offloading architecture for cellular networks to significantly enhance the system's spectrum efficiency. Specifically, the 3D trajectory design and power allocation optimization problem are formulated to maximize the system sum rate. For this purpose, ML-based methods, namely k-means and mutual deep Q-network (MDQN), are utilized to deal with this problem. Another strategy [18] proposes a resource allocation scheme for a UAV-assisted full-duplex (FD) NOMA system to improve spectrum efficiency, reduce terrestrial users' power requirements, and maintain quality of service (QoS) requirements. The method utilizes a joint uplink/downlink stepwise optimization approach to solve the NP-hard optimization problem. Simulation results demonstrate that the proposed method outperforms other methods in terms of spectrum and energy efficiency.
Besides the optimal placement of the UFBS and the selection of an efficient radio access technique, leveraging physical transmission techniques can further enhance the overall UAV communication quality. Device-to-device (D2D) communication is one such technique. For instance, in highly dense urban areas where several devices coexist within a distance of a few meters, they can benefit through the utilization of a cooperative transmission scheme. Consequently, integrating D2D communications into UAV networks has recently attracted a lot of attention, and related issues have also been studied in the literature [19][20][21][22][23]. In [19], the authors have derived the closed-form expressions for the outage probability in a UAV-assisted NOMA network with D2D communication capabilities. Also, they have formulated a power control optimization problem to maximize the D2D sum rate while ensuring a minimum rate for each UAV-connected user. The proposed method is computationally efficient but has a lower sum rate than other methods, as this has been confirmed via the simulation results. Furthermore, the energy-efficient resource allocation problem in D2D communications underlying UAV-enabled networks is investigated in [20]. Especially, this study attempts to optimize the overall energy efficiency of all D2D pairs while ensuring the secrecy rates of all users via combined power control and channel allocation. Accordingly, the Lagrangian dual and Kuhn-Munkres algorithms are utilized to solve this problem. The simulation results have shown that the proposed approach performs better than other benchmark methods. Moreover, the authors of [21] exploited the advantages that UAV-assisted communications offer and effectively combined with the NOMA technique. Particularly, they present a D2D-enhanced UAV-NOMA network architecture in which D2D is added to improve the dispatching efficiency of files. So, a graph-based file dispatching protocol is provided to decrease the UAV-assisted file dispatching mission time and control interference. Simulation results confirm the benefits of the proposed D2D-enhanced UAV NOMA network architecture and the efficacy of the planned protocol. The research presented in [22] proposed a novel approach to address disaster management issues utilizing a UAV-assisted SWIPT-enabled NOMA-based D2D network. They formulated a nonlinear power allocation optimization problem that maximized the system's energy efficiency performance and solved it using the Dinkelbach approach. Simulation results show that the advanced NOMA system outperforms the ordinary NOMA scheme. Alternately, ref. [23] has investigated a sequential optimization problem for resource allocation and communication mode selection in a UAV-assisted D2D cellular network to improve energy efficiency and ensure satisfactory transmission rates for all ground UEs. They proposed a reinforcement learning-based scheme to solve this problem, which has been shown to be effective through simulated results.

Contributions
As presented in the previously detailed literature review, several studies on standalone UAV networks utilize unsupervised machine learning methods such as k-means and k-medoids to place the UAV in the region of interest. However, applying these algorithms individually to a UAV-NOMA and D2D cooperative network might degrade the overall network quality while rendering the D2D network unnecessary. Hence, to achieve enhanced network quality, it is vital to consider the interactions and trade-offs between the two algorithms and the network elements and adopt an integrated approach [7,24].
Concerning the operation of the two placement methods, both k-means and k-medoids are centroid-based clustering techniques. The two methods are fed with the terrestrial users' coordinates as an input feature to find the point where the UFBS should be placed. In such scenarios, k-means behaves well when the terrestrial users form spherical clusters without outliers [24]. In contrast, k-medoids is robust to the outliers and correctly represents the cluster center [7]. Hence, by efficiently combining k-means and k-medoids algorithms, the UAV can be positioned in the most suitable location to ensure effective coverage for D2D communication. This combined approach considers both the similarities in the data points as well as the actual data points themselves and potential outliers or noise in the data. As a result, it leads to a more precise and reliable UAV placement. Thus conspicuously, the combination of these two algorithms exploits the strengths of both k-means and kmedoids in determining the ideal UAV placement [25].
Nevertheless, whenever the UFBS needs to be relocated, it is necessary to determine the most suitable placement method by comparing the results obtained from both clustering algorithms, i.e., k-means and k-medoids. This decision-making process requires the realtime execution of both ML methods, thus increasing the overall time complexity. Also, k-means and k-medoids are clustering algorithms that can be used to group data points together based on their similarities. However, deciding which algorithm to use can be complex and may depend on several factors. Essentially, when the dataset contains nonspherical clusters, outliers, or clusters of different sizes, it is difficult to model a decisionmaking approach with a simple threshold boundary. Hence, this can make it challenging to identify the unsupervised ML method that should be utilized.
Inspired by this observation, the placement scheme selection (PSS) can be regarded as a supervised classification problem, which can be handled through a fully connected artificial neural network (ANN) to enhance the overall system QoS. ANNs can be used to predict which clustering algorithm to use between k-means and k-medoids because they are able to learn the underlying patterns in the data and identify which algorithm is better suited for the given dataset. Moreover, ANNs can capture complex relationships between the input data and the output cluster labels, which can be difficult to model with a simple threshold boundary. Consequently, this paper presents and analyzes an ANN-based UAV placement scheme to enhance the network performance of an integrated UAV-NOMA and D2D cooperative network. The proposed method intelligently integrates the UFBS into the cooperative network by efficiently combining the k-means and k-medoids unsupervised ML algorithms. Concerning the UAV-NOMA and D2D cooperative network, pairs of users are simultaneously served through the UFBS, which utilizes a NOMA optimal user pairing and power allocation strategy. At the same time, terrestrial cooperation is enabled by adopting the D2D communication paradigm, thus improving the overall communication quality. To the authors' knowledge, this is the first time supervised machine learning techniques, such as the ANN, and unsupervised machine learning algorithms, such as k-means and k-medoids, are combined to improve the integrated UAV-NOMA D2D cooperative network. Specifically, the following major contributions are provided: • An ANN-based UFBS placement framework is established in order to improve the overall communication quality of a UAV-NOMA and D2D cooperative network. Towards this end, supervised ML algorithms (ANN) and unsupervised ML algorithms (k-means and k-medoids) are combined. • State-of-the-art data mining strategies are presented to transform raw data into an intelligible format for ANN algorithms and avoid underfitting and overfitting drawbacks. To the best of our knowledge, it is the first time that specific strategies have been provided in the field of UAV-NOMA and D2D cooperative networks.
• A step-by-step approach on how to handle the issue of hyperparameter tuning in ANN models is provided to enhance the predictability of the UFBS placement procedure. • For the UFBS NOMA transmission, an optimal power allocation and user pairing strategy is considered [26]. Also, the proposed scheme promotes the cooperation between aerial and D2D networks.

Structure
The remainder of this paper is organized as follows. Section 2 presents the considered system model, while Section 3 outlines the unsupervised machine-learning-based methods for the UFBS placement procedure. Next, the data collection, data pre-processing, learning, validation, and testing procedures, and the performance metrics of the proposed ANNbased placement scheme selection are outlined in Section 4. Finally, simulation results are given in Section 5, followed by conclusions and future directions in Section 6.

System Model
From the system point of view, we consider a cooperative UAV and D2D-aided wireless communication system, where the UFBS is mainly responsible for communication. The D2D scheme is employed between the ground mobile terminals (GMTs) to achieve higher data rates and spectral efficiency without the involvement of any additional terrestrial or flying base station.
The wireless network architecture is depicted in Figure 1, where a two-tier heterogeneous network is formed, operating in two different and non-overlapping spectrum bands. From now on, these two ways of communication will be referred to as UFBS NOMA transmission when the GMTs receive the data directly from the UFBS through the NOMA scheme and the D2D cooperative transmission when the GMTs cooperate to improve the overall communication quality. Concerning the UFBS NOMA transmission, all GMTs are served by the UFBS via the air-to-ground (A2G) link, utilizing the NOMA technique according to an optimal power allocation and user pairing strategy [27,28]. More specifically, the total available UFBS' bandwidth B u is divided into K slots, equally distributed to the GMT pairs, as depicted in Figure 1. Each GMT pair k (1 ≤ k ≤ K) consists of a strong GMT i and a weak GMT j ground terminal, with i = j, which are sharing the same sub-channel in the frequency/time domain. The UFBS classifies the GMTs of each pair as either weak or strong based on the A2G channel conditions. Following the NOMA principle, in each pair of users the strong GMT i first decodes the signal of the weak GMT j from the received superposition-coded signal and then performs successive interference cancellation (SIC) to retrieve its signal. Hence, leveraging this knowledge, the utilization of the D2D cooperative transmission scheme on the ground can further enhance the communication quality of the weak users of the system. Concerning the D2D ground communication procedure, each strong GMT i decodes and forwards (DF) the received UFBS's signal to the weak GMT j of its pair, thus providing reception diversity through the ground assistance. Consequently, each weak GMT j in each pair will receive two different copies of the same signal, one from the UFBS and the other from its pair, i.e., the strong GMT i , which acts as a relay.
From a technical standpoint, the communication system consists of N = 2K GMTs, where K is the number of GMT pairs and a UFBS located in an R-radius circle region of interest A. Each GMT l (1 ≤ l ≤ N) is randomly placed in the region of interest, and its location is expressed as u l = (x u l , y u l , z u l ) ∈ A. The 3D location of the UFBS is denoted as The UFBS is equipped with an antenna with transmit gain G u t , and total available transmit power P u . Also, the downlink operating frequency of the UFBS is F u . Furthermore, the operating frequency, the total available bandwidth, and the transmit power for the D2D transmission are denoted as F d , B d , and P d , respectively. Moreover, the GMTs are equipped with two antennas, one for the reception of the UFBS's signals with reception gain G u r , and the other for D2D communication, i.e., for transmission and reception, with transmit and receive gain G d t =G d r . We consider that the common antenna for transmission and reception regarding D2D communication is implemented through a radio frequency (RF) switch. Hence, each GMT can only transmit or receive during a D2D frequency/time slot. Finally, the seamless communication between the UFBS and the GMTs requires a reliable and efficient backhaul network. In this regard, we propose the use of zero-touch commissioning (ZTC) cloud radio access network (C-RAN) for the UAV backhaul, as it can provide efficient and automated network management [25,29]. The ZTC-C-RAN model comprises a control element that performs the ZTC procedures, including the instantiation, configuration, and synchronization of the UAV and D2D cooperative network as well as the placement of the UFBS in the region of interest A. Furthermore, the proposed ZTC-C-RAN is benefited from the satellite communication as a backhaul relay between the UFBS and the control center, providing ultra-reliable low latency communication (URLLC) and enhanced mobile broadband (eMBB) network slices responsible for routing the control and data plane information to the terrestrial and aerial segments of the proposed scheme.

Air-to-Ground and Device-to-Device Channels
The channel between UFBS and its associated GMTs is characterized as an A2G channel. To conduct performance analysis, the channel complex coefficient for each GMT l (1 ≤ l ≤ N) is denoted as h u l , and follows the complex Gaussian distribution with zero mean and unit variance ∼ CN (0, 1). Additionally, the path loss attenuation of the UFBS signal is modeled using the elevation angle-based path loss model [25] in an urban environment, and is represented as follows: where FSL l is the free space pathloss given by FSL l = 20 log 4πd l F u c , d l is the transmission distance between UFBS and each GMT l (1 ≤ l ≤ N), and c is the speed of light. In addition, the η LoS and η NLoS coefficients reflect the extra losses for LoS and Non-LoS (NLoS) air-toground transmission links, and they depend on the propagation environment. Moreover, P LoS denotes the probability of the LoS component between the UFBS and each GMT l and is modelled as a function of the altitude h of the UFBS and the 2D Euclidean distance r l between the UFBS and each GMT l . Hence, P LoS can be expressed as follows [30]: where a, b are parameters determined by the propagation environment. Regarding the D2D link between the strong GMT i and weak GMT j of each pair k (1 ≤ k ≤ K) the multipath fading is modeled by the complex Gaussian distribution with zero mean and unit variance ∼ CN (0, 1). The complex channel coefficient for the D2D link is denoted as h d k . Moreover the path loss model for the D2D communication of each pair k (1 ≤ k ≤ K), used from [27], is as follows: where d e k is the distance in km between the strong GMT i and the weak GMT j of each pair k (1 ≤ k ≤ K). Furthermore, the A2G and the D2D links under consideration are assumed to be degraded by additive white Gaussian noise (AWGN), which is statistically modeled by the normal distribution ∼ N 0, σ 2 q with q = {u, d}. The noise power of the A2G and D2D receivers are given by N u = k B T u B u and N d = k B T d B d , respectively; where k B is the Boltzmann constant, and T u , T d are the A2G and D2D receiver system noise temperatures, respectively. Therefore, the corresponding noise variances for each receiver type are σ u = √ N u and σ d = √ N d .

Transmission and Reception Structure
As previously stated, the UFBS forms K user pairs, where each pair k (1 ≤ k ≤ K) consists of one strong GMT i and one weak GMT j . Therefore, the wireless communication system under consideration comprises K strong GMTs and K weak GMTs (2K GMTs in total). Additionally, we assume that the UFBS transmits to the N GMTs without any delays. Such an assumption is acceptable for a broadcast system in which the UFBS transmits the information repeatedly, and the GMTs get this information immediately. Thus, the superimposed NOMA signal, transmitted to each pair k by the UFBS, is expressed as: where s i , s j ∈ C are the signals of GMT i and GMT j , respectively. Also, α i and α j denote the fraction of the total UFBS transmit power P u allocated to each GMT, with α i + α j = 1. The signals received by the strong GMT i and the weak GMT j for each k pair are obtained as follows: where z u ∼ N 0, σ 2 u represents the AWGN of the A2G link. Simultaneously, the received signal at the weak GMT j when the D2D cooperative transmission is activated, is given by the following expression: where z d ∼ N 0, σ 2 d stands for the AWGN noise in the D2D link. Since we have considered the decode and forward (DF) operation regarding the D2D links, the strong user GMT i of each pair k immediately decodes the received UFBS NOMA signal x u k and then estimates the weak user's signal s j . Subsequently, the strong user GMT i forwards s j to the weak user GMT j through transmitting the signal:

Signal-to-Interference-Plus-Noise Ratio (SINR) Analysis
In general, for each GMT l (1 ≤ l ≤ N) in the considered communication system, the A2G channel gain is calculated as: including additional gains, losses, and the noise power of the UFBS receiver N u . Hence, using (5), the instantaneous signal-to-noise ratio (SNR) γ u i of the strong GMT i to detect its own signal s i , assuming perfect SIC, is given as follows: where Γ u i is the A2G channel gain of the strong GMT i , which involves the noise power of the UFBS receiver N u , as it can be observed in (9). Furthermore, the instantaneous signal-to-interference plus noise ratio (SINR) γ u k , for detecting the signal s j of the weak user GMT j on the strong user GMT i , is expressed as: Moreover, the SINR γ u j at the weak user GMT j , for detecting its own signal s j from the UFBS is obtained by: where Γ u j is the A2G channel gain for the weak GMT j . Furthermore, the SINR γ d k at the weak user GMT j for detecting its signal, which is relayed by the strong user GMT i in the same pair k, equals: where Φ k is the channel gain of the D2D link between the strong GMT i and the weak GMT j belonging to the same NOMA pair k (1 ≤ k ≤ K) and is expressed as:

Achievable Rates Analysis
As the SINR expressions of the strong GMT i and the weak GMT j for each pair k have been determined, it is straightforward to compute the corresponding achievable rates. The theoretical achievable rate of each GMT l , when we consider a conventional UAV-OMA transmission scheme, can be mathematically expressed as: In contrast, in the case of a UAV-NOMA scheme, the maximum downlink NOMA achievable rates which succeed by the strong GMT i and the weak GMT j through the A2G channel are: respectively. Moreover, for the strong GMT i , the achievable rate of the weak GMT j 's signal is equal to: Also, the maximum achievable rate R d k concerning the established D2D link between the strong user GMT i and the weak user GMT j is expressed as: Since the weak GMT j can receive its signal directly from the UFBS or via the strong GMT i of the pair it belongs to utilizing the D2D communication capabilities, GMT j 's device always chooses to be served by the link that offers the highest achievable rate. Thus, it holds that the maximum achievable rate of each weak GMT j that belongs to the NOMA pair k, combining the UAV-NOMA with cooperative D2D scheme, can be calculated as follows: where Λ j is the achievable rate through the D2D communication with the strong GMT i . In fact, the weak GMT j 's signal is decoded on the strong GMT i , and the D2D communication provides the channel to forward this decoded signal from the strong GMT i to the weak GMT j . As a result, the weak GMT j can never receive a rate greater than R u k , meaning that Λ j ≤ R u k . Essentially, the quality of the D2D communication will determine whether the weak GMT j will enjoy the maximum possible rate R u k or less. Specifically, we can recognize the following cases: Case 1. The D2D channel is profitable for the weak user, i.e., R d k ≥ R u k , and the achievable rate of the weak user is Λ j = R u k . This happens because the weak user can never receive a rate greater than the achievable decoding rate of its signal on the strong user.

Case 2.
The D2D channel is not profitable for the weak user, i.e., R d k < R u k , and the achievable rate of the weak user is equal to the transmission rate that the D2D communication can provide, i.e., Λ j = R d k . In this case, we observe that the achievable rate of the weak user is limited based on the capabilities of the D2D communication channel.
Based on the above cases concerning the use of D2D communication for receiving the signal on the weak user, we observe that the minimum rate between the achievable rates R d k and R u k is always selected. Therefore, in the case where D2D communication is used, it follows that the achievable rate of the weak user is equal to (20): Utilizing the UAV-NOMA and D2D-aided scheme the total sum rate which is succeeded on each pair k is equal to: Therefore, the total system sum rate that can be achieved by utilizing the aforementioned cooperative scheme is:

User Pairing Policy
So far, we have noted that the system's GMTs are separated into K groups of two members each, but we have not specified how the GMTs are allocated to each group. Hence, in this sub section, we propose the maximum weight perfect matching (MWPM) pairing policy which takes into account both the A2G and D2D channel conditions. The primary objective is to maximize the system's total sum rate. Therefore, a matching technique must be implemented between the GMTs in order to discover those user pairs that optimize the system's overall sum rate. The MWPM method generates ( N 2 ) pairings between the N GMTs and retains the K that maximize the system sum rate. For this purpose, it is necessary to define a binary matrix Θ that represents the pairing relationship between the GMTs as follows: The dimension of the pairing matrix Θ that is retrieved from the MWPM method is equal to N × N. Moreover the diagonal elements of the pairing matrix Θ are all equal to zero because one GMT cannot pair itself. Also, due to the fact that the matrix components θ i,j and θ j,i both pertain the same GMT pairing, it can also be argued that θ i,j = θ j,i . Therefore, the MWPM pairing policy can be expressed as the following maximization problem: The maximization problem (25) can be regarded as a matching problem in a fully connected undirected graph G(V, E), where the total number of vertices is equal to the total number of GMTs |V| = N. E is the set of all feasible edges θ i,j , connecting all users to each other with i = j and i, j = {1, 2, ..., N}. In order to solve this issue optimally, we use the Blossom algorithm to obtain an optimal pairing strategy between the GMTs [31].

Power Allocation Strategy
Concerning the UFBS NOMA transmission, the objective is to maximize the sum rate of each pair of GMTs under the condition that both GMTs enjoy at least the rate utilizing the conventional UFBS OMA transmission. This is an optimization problem which is mathematically expressed as follows: The solution to this problem has been obtained in [26,27] by identifying the optimal value of α i , as: To conclude, in Table 1, the definitions of most of the parameters involved in this study are included.

UFBS Placement Procedure
In this section, we analyze the placement procedure of the UFBS in the region of interest A. For this purpose, we propose an UFBS placement procedure that is divided into two sub-processes. The first sub-process aims to find the 2D plane position of the UFBS. For this purpose, k-means and k-medoids algorithms are exploited and assessed [7,9]. The second sub-process seeks to discover the UFBS's height aiming to improve coverage and communication quality, thus determining its location in the three-dimensional space.

k-Means Analysis and Setup
This sub-subsection describes the UFBS 2D placement procedure utilizing the k-means algorithm. In more detail, the k-means algorithm is fed with the coordinates u l (1 ≤ l ≤ N) of all GMTs located within the region of interest A. Subsequently, the algorithm groups the users into a cluster and returns as output the centroid point p c ). The goal of the k-means method is to minimize the centroid-point to group distances metric, expressed as ∑ u l ∈U u l − p c 1 2 . In particular, this expression represents the objective function of the following minimization problem: Therefore, the UFBS should be placed in p c 1 to achieve improved communication quality. The operation of the 2D UFBS placement process using the k-means algorithm is summarized in Algorithm 1.
end for 11: for k ← 1 to Υ do 12: end for 14: t = t +1 15: until C t − C t−1 ≤ 16: output: A set of centroid points that the Υ UFBSs will be deployed C t .
For simplicity, it is assumed that the number of UFBS Υ = 1. However, as can be shown in Algorithm 1, the k-means algorithm can be straightforwardly applied to scenarios with Υ > 1. Hence, in our case, the centroid p c 1 is given by the following three steps: Step 1: Determine the coordinate Y u of the UFBS as follows: Step 2: Determine the coordinate X u of the UFBS as follows: Step 3: Configure the point p c 1 that the UFBS should be placed as follows: p c 1 = (X u , Y u , h), where h is the initial height of the UFBS before the 3D UFBS placement procedure.
Finally, it is essential to acknowledge that the choice of the optimal number of clusters for a clustering problem is not straightforward and may be influenced by a range of factors, including the specific requirements and objectives of the analysis, as well as the inherent properties of the data. Within the context of our system model, the user locations are randomly distributed within a circular region of interest, forming a single cluster. This characteristic of the data renders the choice of Υ equal to 1 in k-means clustering a sensible and appropriate decision, as it adequately captures the underlying structure of the data. The resulting cluster is representative of the overall distribution of users and adequately reflects the inherent properties of the dataset. In this particular scenario, using a single cluster is sufficient to accurately and effectively represent the nature of the user distribution and therefore is a suitable approach to analyze the data [32].

k-Medoids Analysis and Setup
In this sub-subsection, the basic principles of the k-medoids algorithm are presented. The k-medoids method can be used for the 2D placement of the UFBS in A in the same fashion as k-means. However, the way that the UFBS placement point p 1 is selected differs between the two approaches. As previously stated, in the k-means UFBS placement scheme, the centroid point p c 1 is the empirical mean of the coordinates U of the GMTs in A. However, in k-medoids, it is one of the actual GMT l (1 ≤ l ≤ N), and it is called medoid point p m 1 . Specifically, in k-means, the point-to-group-centroid distance is assessed concerning a virtual point p c 1 ∈ A, while in k-medoids, it is measured concerning one of the actual data points u l ∈ A p m 1 = u l where (1 ≤ l ≤ N), i.e., actual GMTs location. Similarly to the k-means algorithm, the goal of the k-medoid method is to minimize the medoid-point to group distances metric, expressed as ∑ u l ∈U u l − p m The operation of the 2D UFBS placement process using the k-medoids algorithm is summarized in Algorithm 2.
In the same manner with k-means, it is assumed that the number of UFBS Υ = 1. However, as can be shown in Algorithm 2, the k-medoids algorithm can be straightforwardly applied to scenarios with Υ > 1. Additionally, Algorithm 3 is the modified version of Algorithm 2 for the special case where Υ = 1.

3D UFBS Placement
Following the determination of the UFBS's 2D deployment location, the 3D UFBS placement procedure adjusts the UFBS's altitude to provide the highest quality of service to GMTs within the area of interest A. Thus, the farthest GMT l from the point p 1 where the UFBS is finally placed should be identified, according to the horizontal two-dimensional distance r l . After that, the convenient height for the critical point p 1 is found by solving the following equation using (1): For the considered A2G path-loss model, as the altitude of the UFBS increases the path loss initially decreases and then increases again. This behavior can be attributed to the dependence of the particular A2G model on the elevation angle and the distance between the UFBS and each GMT l . As the height of the UFBS increases the elevation angle also increases, leading to an increased probability of line-of-sight, i.e., obscurance by buildings and other surrounding objects is reduced. Based on this behavior, the A2G path loss PL u l function is convex [25]. Thus, it can be deduced that the global minimum is consistently located at the critical point which can be derived through the Equation (30). = 10 −6 3: t = 0 4: Initialize Υ medoid points C t = p m 1 , p m 2 , ..., p m Υ ⊆ U, randomly 5: S k = ∅, ∀ k = 1, 2, · · · , Υ. 6: for i ← 1 to N do 7: 11: repeat 12: for k ← 1 to Υ do 13: for i ← 1 to N do 14: if u i / ∈ C t then 15: Swap the role of C t k with u i

Computational Complexity of k-Means and k-Medoids Algorithms
Another crucial aspect is to estimate the computational complexity of the examined k-means and k-medoids algorithms based on their respective methods as described in Algorithms 1 and 2, respectively. K-means is a centroid-based algorithm, and k-medoids is a medoid-based algorithm.
The computational complexity of the k-means algorithm has been proven to be O(nkId), where n is the number of data points, k is the number of clusters, I is the number of iterations, and d is the number of dimensions [33]. It uses the mean of the data points to calculate the cluster centroid and updates the assignment of the data points to the closest cluster centroid. The algorithm requires multiple iterations until convergence. The time complexity of the k-means algorithm is affected by the number of data points, the number of clusters and the number of dimensions.
The computational complexity of the k-medoids algorithm has been proven to be O(k(n − k) 2 I), where n is the number of data points, k is the number of clusters, and I is the number of iterations [34]. K-medoids selects a single data point as the representative of a cluster, known as the medoid, and updates the assignment of the data points to the closest medoid. The algorithm requires multiple iterations until convergence. The time complexity of the k-medoids algorithm is affected by the number of data points, clusters, iterations, and the distance metric used.
In summary, both algorithms have a polynomial time complexity, and the main difference is that k-means use centroids, and k-medoids use medoids as the center of the cluster. As a result, the k-means is sensitive to the initial choice of centroids, while k-medoids is less sensitive and tends to find the global optimum more quickly.

ANN-Based Placement Scheme Selection
The main difference between the two algorithms mentioned before, is that the virtual centroid point p c 1 ∈ A given from the k-means where the UFBS will be placed, will be equidistant from all GMTs. Conversely, the medoid point p m 1 ∈ A given from the k-medoids will be a GMT location within the region of interest that will minimize the objective function (see (29)). Consequently, if the GMTs are spread equally in the area of interest, the p c 1 point provided by k means will improve the channel quality of GMTs, since the distances of the GMTs from the UFBS will be almost identical and the LoS probability will be significantly high. On the contrary, if a GMT is remote (outlier), the k-means algorithm will try to find the point p c 1 equidistant from every GMT, detaching it quite a bit from the majority of GMTs and thus increasing the GMTs' propagation losses. In contrast, the k-medoids through the proposed p m 1 point reduce the point-to-group-centroid distances, achieving higher A2G channel gains and increasing the QoS of the overall system.
To better highlight the advantages of each algorithm, let us consider a toy network with GMTs located in the 2D plane as depicted in Figure 2. Focusing on Figure 2 on the right, the group of GMTs in the right form a cluster, while the rightmost GMT is an outlier. The p c 1 ∈ A point proposed by the k-means is greatly influenced by the outlier and thus cannot represent the correct cluster center. In contrast, the medoid point p m 1 ∈ A provided by k-medoids is robust to the outlier and correctly represents the cluster center. On the contrary, regarding Figure 2 on the left, we notice that there is no remote GMT, and everyone is close to each other, forming a cluster of GMTs. Consequently, the p c 1 ∈ A proposed from the k-means is equidistant from all GMTs, thus increasing the channel gain compared to the p m 1 ∈ A, which is not equidistant from all GMTs offered from the k-medoids algorithm.  Motivated by this observation, the PSS can be regarded as a supervised classification problem, where it can be approximated through the utilization of a fully connected artificial neural network (ANN) to enhance the overall system QoS. Since an ANN model learns how to efficiently match predictions to patterns seen during the training method, a data set containing various features that affect the A2G transmission should be created. To this end, this section presents the data set generation procedure, the date prepossessing, and the hyper-parameter tuning of the ANN model.

Data Set Generation
In this subsection, the dataset generation procedure concerning the training of the ANN model is presented. The objective of the ANN model is to predict the UFBS placement method to enhance the overall communication quality according to specific key performance indicator (KPI). In this work, the considered KPI that should be improved is the total system sum rate, R s , given in expression (23). Hence, the optimization problem that the ANN model aims to solve is represented by Equation (23), which expresses the objective function that the ANN model seeks to maximize. This can be achieved through the ability of a well trained ANN model to recognize patterns, indicating when each method should be conducted to achieve the highest system sum rate. Using Equation (23) as a KPI for dataset generation ensures that the generated data is relevant and valuable for training and evaluating ANN models. Furthermore, incorporating a KPI directly aligned with the problem being addressed can guarantee that the model is configured optimally for the targeted classification task and exhibit superior performance for the specific issue [35]. Hence, considering the k-means and the k-medoids algorithms, the ANN should determine which of these two UFBS placement methods will achieve the highest R s . Furthermore, the calculation of R s involves various transmission parameters of the considered wireless communication system presented in Section 2, such as the 3D location of the UFBS, as well as the A2G propagation model. Therefore, all these aspects should be carefully considered during the training procedure of the ANN model.
In general, optimizing the total system sum rate, i.e., the R s , can offer valuable insights into the optimal allocation of system resources, including bandwidth and transmit power [36]. In the context of a UAV-NOMA and D2D cooperative network, optimizing R s can help identify the most effective resource allocation strategies for achieving optimal system performance. For instance, optimizing the total system sum rate allows the cooperative scheme to allocate bandwidth and power to UAVs and D2D users to maximize the total data rate transmitted over a given period. In addition, this optimization process can consider the physical layer parameters of the UAV and D2D users, including their communication requirements. For example, UAVs may require higher power allocations to maintain stable connections due to their altitude. Additionally, the distance of D2D users from the UFBS can impact their channel conditions and overall communication performance. Considering these physical layer parameters during the optimization process, the system can allocate resources more efficiently and effectively, leading to improved overall performance. In summary, optimizing the sum rate in a UAV-NOMA and D2D cooperative network can help achieve the best use of resources and enhance the system's overall performance. It is noted that the proposed capacity based optimization of the R s can be considered as the upper bound on the maximum amount of data that can be reliably transmitted over a communication channel as the size of the channel goes to infinity. However, achieving this limit is often difficult in real world scenarios due to practical constraints such as noise and interference in the channel.
Focusing on the data set generation process, Monte Carlo simulations were carried out using Matlab © (MATLAB (Version R2021a) [Computer software]. MathWorks, Natick, MA, USA) software to conduct the entire training data set D, following the system model described in Section 2 and depicted in Figure 1. More specifically, in each simulated transmission frame, the GMTs are generated randomly following the uniform distribution into the circular region of interest, while the UFBS is placed through the two unsupervised algorithms mentioned above. It is noted that all GMTs are served by the UFBS via the A2G link, utilizing the NOMA technique, while the D2D cooperative transmission is activated to improve the overall communication quality. The A2G and D2D channel gains are generated based on expressions (9) and (14), respectively, while the urban environment parameters are given in Section 5. Concerning the dataset format, it can be expressed as D = {(x i , y i )} with i = 1, . . . , d, where d is the total number of instances. Also, x i ∈ R w is the input vector of the i-th instance comprised of w features and y i ∈ {k − means, k − medoids} is the class of x i . In the following, the input features vector x i consists of eight parameters, i.e., w = 8, that affect the placement procedure of the UFBS and are presented in detail in Table 2. Moreover, for the computation of class y i , we evaluate the total system sum rate R s in each simulated frame (see Equation (23)) for each UFBS placement procedure. Thus, the class value of the i-th instance, y i , is determined as the placement method that achieved the highest R s .
To precisely train the ANN model and to prevent over-fitting and under-fitting issues, the entire data set D is divided into training, validation, and testing subsets using the data splitting approach. A popular strategy for data partitioning is to use 70-80% of the entire dataset for training, with the remaining proportion used to improve and assess the trained models. Consequently, 70% of the total samples are chosen for the training phase, 15% for validation, and the remaining 15% for testing the proposed ANN model [37]. The training set is used to train the ANN, the validation set is used to evaluate the performance of the ANN during training, and the testing set is used to evaluate the performance of the ANN after training.

Data Pre-Processing
The effectiveness of an ANN is highly dependent on the quantity and quality of training data. Consequently, regardless of which classifier is used, inferior models are generated if the training data are inaccurate. In light of the above assertion, stratified sampling and data normalization procedures are utilized to obtain the most incredible performance of the ANN model.
As an essential data pre-processing step, instance selection is employed not only to cope with the infeasibility of learning from massive data sets, but also to reduce the risk of the ANN model tending towards the majority and avoid coming up with what is known as the accuracy paradox [38]. For this purpose, stratifying sampling is applied. Hence, the overall training set is reduced, and the class values are uniformly distributed in the training sets, as shown in Figure 3. After removing redundant instances per class values, 3000 data samples were collected, which means a 50% reduction of the initial 6000 raw data samples. In addition, an ANN model cannot attain optimal performance if the feature values are in different units and scales. In order to resolve these challenges, it is necessary to use a normalizing technique that eliminates the effects of those mismatches. Using this approach, the values of the dataset's features are scaled into a given range while keeping the original dataset's overall distribution and ratios. Hence, before the training phase, all input features were normalized for this purpose. The formula for normalizing is as follows: where X is a value of the corresponding feature under normalization, X max and X min are the maximum and the minimum value of this feature, respectively, and X norm ∈ [0, 1] is the final normalized value [37].

ANN Model Construction
ANN has the most hyper-parameters to be tuned among all the ML algorithms. Consequently, this subsection provides a concise but adequate description of the standard hyperparameters of an ANN model and their tuning.
The first step in hyperparameter tuning is finding the layer type [39]. Since non-linear data collection is used in this study, we investigate a fully connected multi-layer perceptron (MLP) network in which the input from the dataset propagates in one direction through one or more hidden layers. Therefore, using the normalized feature vectors obtained through (31) and their corresponding labels, we can build an ANN model consisting of one input layer, l i = 1, l h ∈ {1, 2, ..., L} hidden layers, and one output layer l o = 1 for the PSS prediction. The l i layer consists of m i = 8 neurons which represent the input features vector x i for the ANN: Each term mentioned in (32) is a real number and is described in detail in Table 2. Moreover, the l o consists of m o = 2 neurons, which is the total number of classes that we want to predict. The number of neurons m h per hidden layer can be determined as [40]: Consequently, if there is one hidden layer (l h = 1), the number of neurons is 63 according to (33). Similarly, the number of neurons for two hidden layers (l h = 2) is 31.5 per layer, resulting in the selection of 32 and 31 neurons for the first and second hidden layers. Additionally, we model ANN with l h = 3 and l h = 4 hidden layers, and the number of neurons per each hidden layer is listed in Table 3. The following step in hyperparameter tuning concerning ANN models is to determine the activation and the loss function. In this study, the rectified linear unit (ReLU) activation function is employed in hidden layers. It is easy to build and overcome the constraints of widely used activation functions like Sigmoid and Tanh. Furthermore, since PSS may be seen as a binary classification problem, the output layer activation function is SoftMax. Regarding the loss function, cross-entropy is utilized since it is the most widely used for classification problems. Therefore, in order to find the best ANN hyperparameters, the selected loss function should be minimized. The minimization of the loss function is achieved through gradient descent (GD) with momentum backpropagation. The momentum term navigates the GD along the relevant direction and softens the oscillations in irrelevant directions. For this purpose, the grid search method is utilized. Accordingly, the momentum is tested for values between 0.2 and 1 with a step of 0.1. In the last phase of hyperparameter tuning, the learning rate and the number of epochs are chosen. The learning rate is evaluated for values between 0.001 and 0.1 with a step of 0.001, while the number of epochs range is set to be from 1 to 1000. In addition, the early stopping criterion is used to improve the model's generalization capability and minimize overfitting. Finally, in Table 4, all the finalized hyperparameters are listed for ANNs methods derived throughout the training, validation, and testing process. Figure 4 presents the evaluation of the training, validation, and testing phases in terms of the loss function versus the number of epochs. In essence, the number of epochs directly affects the adopted method's convergence. The low number of epochs entails that the algorithm may converge at a local minimum. Nonetheless, too many epochs may lead to over-learning. The results in Figure 4 concerning the modelled ANNs prove that the loss function for all processes, i.e., training, validation, and testing, converges smoothly, obtaining constant loss values and reaching the global minimum in a short period. The acquired global minimum loss for the convergence during the testing phase, and the corresponding epoch values are listed in Table 3. According to Table 3, ANN with two hidden layers demonstrates the best performance among all the examined ANN techniques, providing the minimum loss score of 0.06. Furthermore, for each ANN layout, the training time is also recorded. Specifically, the training times for ANN with one, two, three, and four hidden layers are 0.81, 0.92, 1.4, and 1.6 seconds, respectively. Comparing the training time of the assessed ANNs models, it is evident that the training time depends directly on the applied layout structure. Finally, the conventional time complexity (TTC) for any ANN layout is O(n 3 ) [37]. The TTC represents the standard theoretical asymptotic complexity, which takes into account only the training samples n. It only examines training samples, since the training phase is the most time-consuming operation in ML algorithms and occurs offline, and not in real-time scenarios.

ANN Model Selection
This section presents the evaluation results obtained from the ANNs methods for the testing set. The evaluation of the ANNs methods and, by extension, the choice of the ANN algorithm to solve the PSS classification problem is achieved based on the accuracy, precision, recall, and F1 score performance metrics.
Specifically, accuracy, precision, recall, and F1 score are commonly used evaluation metrics for assessing the performance of ML models, particularly in classification tasks. These metrics are calculated based on the number of true positive (TP), true negative (TN), false positive (FP), and false negative (FN) predictions made by the model. Accuracy is the proportion of correct predictions made by the model out of all predictions made. In the context of sum-rate maximization, a high accuracy score would indicate that the ANN can predict the best PSS more often accurately, and it is calculated as follows: Precision is the proportion of true positive predictions made by the model out of all positive predictions made. For example, in the context of sum-rate maximization, a high precision score would indicate that when the ANN predicts a PSS, it is more likely to be the best prediction that maximizes the system sum rate, and it can be expressed as follows: Recall (also known as sensitivity or true positive rate) is the proportion of true positive predictions made by the model out of all actual positive cases. In term of sum-rate maximization, a high recall score would indicate that the ANN is able to find more of the actual PSS solutions, and it is calculated as follows: F1 score is a harmonic mean of precision and recall. In the context of sum-rate maximization, a high F1 score would indicate that the ANN has a good balance of precision and recall, making fewer false PSS predictions while also identifying most of the relevant cases. It is calculated as: Figures 5 and 6 present the evaluation results obtained from the ANN methods for the testing set. Accuracy, precision, recall, and the F1 score are used to evaluate the ANN's approaches. More specifically, the accuracy of each ANN model is depicted in Figure 5, while Figure 6 illustrates the mean precision, recall, and F1-score obtained from each ANN method. The classification accuracy in Figure 5 reveals that the best prediction is achieved through the ANN with two hidden layers ANN 8−32−31−2 . Comparing the performance of the different ANN layouts, the prediction accuracy decreases until the neural network reaches two hidden layers in depth. Then, by extending the depth of the ANNs to more than two hidden layers, the accuracy is diminished. Specifically, the prediction accuracy increases from 92.5% for a single hidden layer (ANN 8−63−2 ) to 95.32% for a two-layered (ANN 8−32−31−2 ) and then decreases to 92.3% and 92.7% for a three (ANN 8−21−21−21−2 ) and four-layered (ANN 8−16−16−16−15−2 ) structure, respectively. As can be observed in Figure 6, the assessed ANN models exhibit exceptional performance with an F1-score greater than 91%, maintaining an average accuracy and average recall greater than 91%. Among the evaluated ANNs, the neural network with two hidden layers ANN 8−32−31−2 achieves the best prediction result. The specific model yields a mean precision of 94.12%, a mean recall of 93.14%, and an average F1-score of 93.63%. Hence this level of accuracy in a balanced data set implies that the model has recognized and formed strong correlations between features and class and has avoided overfitting issues. Moreover, this success is related to the two-layered neural network's ability to effectively approximate nonlinear functions and reliably predict the PSS class value. Hence the ANN with two hidden layer is chosen to solve the PSS classification problem.

Performance Evaluation
In this section, the system sum rate and the spectral efficiency results from Monte Carlo simulations conducted in Matlab © are presented to evaluate the performance of the proposed ANN-based PSS. The simulations were executed on a computer consisting of a Windows 10 64-bit operating system, Intel Core i7-8700 CPU, and 16 GB of RAM. Moreover, the impact of various system parameters, such as D2D bandwidth allocation and the UFBS transmit power P u , on the performance of the proposed method is studied.
Furthermore, the proposed ANN-based PSS is compared against the standalone UFBS placement schemes k-medoids and k-means [7,9]. These two methods will be referred to as the k-means deployment process (MEA-DP) and the k-medoids deployment process (MED-DP). More specifically, comparisons are made between different networks schemes, such as the cooperative UAV-NOMA and D2D scheme termed as NOMA-D2D, and two standalone UAV transmission schemes without D2D communication capabilities between the GMTs, the UAV-NOMA optimal user pairing scheme [26], called NOMA, and the time domain UAV-OMA scheme, termed as OMA. In order to assess the performance of the proposed scheme as well as the compared ones, we define the spectral efficiency as: where R ach is the achievable system sum rate and B occ denotes the total utilized network bandwidth. Concerning both the standalone OMA and NOMA transmission scheme, B occ = B u , while for the NOMA-D2D scheme, B occ = B d + B u . The rest of the selected parameters regarding the abovementioned scenarios are listed in Table 5.  Figure 7 presents the spectral efficiency performance of the proposed ANN-based PSS for different terrestrial D2D bandwidth values and between the different network schemes. As it can be observed, the proposed ANN-based PSS scheme combined with the NOMA-D2D transmission technique for B d = 0.2 provides significant spectral efficiency gains compared to the other NOMA-D2D cooperative networks with B d = 0.2 and the standalone NOMA and OMA schemes. It is noteworthy that the proposed strategy, utilizing a B d equal to 0.1 MHz, exhibits comparable performance with a B d equals to 0.2 MHz for low UFBS transmit power values. Conversely, for high UFBS transmit power, the proposed strategy utilizing a B d equals to 0.2 MHz is determined to result in the near optimal spectral efficiency. Also, regarding the NOMA-D2D cooperative network with B d ≤ 1.2 MHz, the proposed method achieves higher spectral efficiency gain than the standalone NOMA and the OMA scheme for all UFBS power transmission values. In contrast, for B d > 1.2 MHz, the suggested method's spectral efficiency in a NOMA-D2D cooperative network is inferior to that of NOMA. This occurs because there is no need for additional bandwidth since the weak users' rates are always constrained by the decoding rates of their signals at the strong users (21). Therefore, regarding the communication network, B d values greater than 1.2 MHz are considered a waste of resources. Additionally, for B d = 1.2 MHz, a switch case statement can be established. More specifically, in the case where the P u is lower than 20 dBm, the NOMA-D2D cooperative network outperforms the NOMA scheme, while for P u > 20 dBm, the standalone NOMA outperforms the NOMA-D2D cooperative scheme. This phenomenon occurs for large P u values since the A2G channel between the weak GMTs and the UFBS is strengthened, resulting in greater achievable rates for the weak GMTs via the direct A2G connection. Hence the D2D communication between the K pairs is mainly avoided, as the offered data rates via the D2D links are lower than those that can arise through the A2G links. This claim can be verified by expression in (21). Moreover, spectral efficiency degradation is observed when the terrestrial D2D bandwidth B d is greater than 1.2 MHz. In this case, the weak users can not efficiently exploit the capabilities offered by the wireless D2D channel link, as the rate received through the terrestrial cooperation is restricted by the decoding rates achieved by the strong users of each pair. This observation is derived as a result of the constraints imposed by (17)- (19), as well as from the explanation of cases 1 and 2 in Section 2.4. As an illustrative case for this phenomenon, the baseline standalone OMA scheme behaves better than the NOMA-D2D scheme with B d = 3.0 MHz in terms of spectral efficiency. Therefore, in the case of cooperative NOMA schemes such as the proposed one, the value of the terrestrial D2D bandwidth B d should be carefully chosen to avoid wasting spectrum resources. Also, in the NOMA-D2D cooperative network, for UFBS transmit power in the range of 0 to 12 dBm, it can be observed that the spectral efficiency is approximately the same for B d values equal to 0.1 and 0.2 MHz. However, for UFBS transmit power higher than 12 dBm, the proposed method with B d = 0.2 MHz achieves higher spectral efficiency than the others. In other words, B d = 0.2 MHz is a near-optimal D2D bandwidth value for the considered communication system. In Figure 8, the sum rate performance of the proposed ANN-based PSS is examined for the different network schemes. It can be easily observed that the employment of the suggested PSS technique in the NOMA-D2D cooperative network readily outperforms OMA and NOMA schemes for all UFBS transmit power values and regardless of the D2D bandwidths value allocations. Moreover, for the NOMA-D2D cooperative network, we observe that the sum rate is approximately the same for any value of B d > 0.1 MHz. This can be supported by (21), which demonstrates that there is no need to devote more bandwidth to D2D transmission. Also, for UFBS transmit power in the range of 0 to 12 dBm, it can be observed that the sum rate is approximately the same for all B d values. Hence, large B d values for low-to-medium UFBS transmit powers are thus seen as a waste of resources. Therefore, for that UFBS transmit power range, there is a maximum value B d , which should not be exceeded to avoid wasting resources. Nevertheless, the findings from Figures 7 and 8 demonstrate that dynamic bandwidth allocation is required for D2D out-band communication to improve both the sum rate and spectral efficiency performance. Figures 9 and 10 show the effects caused by the different placement methods on the spectral efficiency and the system sum rate, respectively. More specifically, Figure 9 illustrates the spectral efficiency performance of the different communication schemes, NOMA-D2D with B d = 0.2 MHz, NOMA, and OMA, utilizing the different placement procedures. As can be observed, the ANN-based PSS applied to the NOMA-D2D cooperative network scheme achieves significant spectral efficiency gains compared to MEA-DP and MED-DP for all UFBS power transmission values. Also, observing all the network schemes individually (i.e., NOMA-D2D, NOMA, and OMA), the proposed ANN-based PSS outperforms the other two methods for all UFBS power transmission values. This results from the ability of the ANN to recognize patterns, indicating when each method should be conducted. Furthermore, regardless of the placement method, the cooperation between the aerial and D2D networks is promoted, i.e., the NOMA-D2D method, since it achieves the maximum spectral efficiency rates compared to standalone NOMA and OMA schemes. Moreover, for all UFBS transmission power values, the MEA-DP outperforms the MED-DP scheme in all three network configurations. This is justified by the explanation given in Section 4. Specifically, as the GMTs are placed randomly and uniformly in the region of interest, the probability of an outlying user appearing is very low. Consequently, in most cases, the k-means algorithm places the UFBS at such a point that it is equidistant by the users, thus improving the quality of channels gains against k-medoids. Lastly, the spectral efficiency of the ANN-based PSS applied to the standalone NOMA scheme is higher than that of MED-DP in the NOMA-D2D cooperative network scheme for P u values of approximately up to 22 dBm. This phenomenon occurs due to the improvement of the A2G channels through the proposed placement scheme. Consequently, in contrast to other cooperative systems, such as satellite D2D cooperative networks [27], the success of aerial and D2D cooperative networks strongly relies on the UFBS placement procedure. Hence, an inaccurate prediction concerning UAV's position might degrade the overall network quality and lay the D2D network unnecessary.  Next, Figure 10 presents the sum rate for B d = 0.2 MHz and different placement procedures for NOMA-D2D, NOMA, and OMA network schemes. Throughout the P u range and regardless of the placement method scheme, it can be shown that the sum rate of the NOMA-D2D cooperative network is superior to that of NOMA and OMA, respectively. Similarly, as in spectral efficiency in Figure 9, the proposed ANN-based PSS outperforms the other two placement procedures for all network schemes. Moreover, it is observed that the proposed method, when applied in a NOMA scheme, can achieve higher spectral efficiency gains for the MED-DP applied in NOMA-D2D for P u > 22 dBm. Therefore, in such a scenario, with the deployment of the proposed method, we could avoid D2D transmission and save the entire D2D bandwidth.
Overall, the sum rate results of the NOMA-D2D cooperative scheme in all placement procedures indicate that the weak user's achievable rate can be significantly improved. This advantage results from strong users cooperating with weak users of the system through out-band D2D communication.However, the sum rate and the spectral efficiency in all network schemes are heavily contingent on the UFBS placement within the region of interest. Regarding the results in Figures 7-10, the proposed ANN-based PSS outperforms the other two methods in all network schemes and can offer terrestrial users reliable and high-quality communication.
Finally, Table 6 summarizes the key characteristics of the proposed ANN-based PSS and the compared MEA-DP and MED-DP schemes. Specifically, our method is less sensitive to outliers compared to MEA-DP, making it more robust in noisy environments. It also has higher reliability compared to both MEA-DP and MED-DP. Regarding spectral efficiency and sum rate, our method outperforms both MEA-DP and MED-DP, indicating that it may be a better choice for optimizing the utilization of resources and achieving higher data transmission rates in the given scenario.

Conclusions and Future Directions
Summarizing this paper, we proposed an ANN-based PSS method that maximizes the spectral efficiency and the sum rate in a NOMA-D2D cooperative network. It is the first time supervised ML methods are combined with unsupervised ones to enhance the placement procedure of the UFBS; the examples demonstrate the improvements achieved. To evaluate the performance of the ANN-based PSS policy, we compared it with two stand-alone unsupervised ML methods schemes. The results showed that the proposed method outperforms the other two in different network scenarios, such as NOMA-D2D cooperative, NOMA, and OMA schemes, regarding sum rate and spectral efficiency terms. Furthermore, the results show that utilizing the proposed method in a UAV-aided D2D-NOMA-cooperative network can offer terrestrial users reliable and high-quality communication compared with stand-alone NOMA or OMA schemes.
Possible future directions include studying various machine learning models as base learners and forming ensemble approaches to enhance the predictability of the placement procedure. Furthermore, in future work, we consider examining machine learning methods to identify the optimal D2D bandwidth value that achieves the maximum sum rate and, simultaneously, the maximum spectral efficiency regarding a UAV-aided D2D-NOMAcooperative network. Finally, of potential interest is the integration of virtual MIMO in the context of aerial-terrestrial networks to improve communication between UAVs and other devices. Specifically, UAVs typically have limited size, weight, and power constraints, which can make it challenging to install multiple antennas and radio resources on them. By using virtual MIMO, various UAVs can work together as a single MIMO system and share their antennas and radio resources, increasing the range and capacity of the communication [41,42]. In addition, virtual MIMO can also improve the robustness of communication in UAV networks, as it can reduce the impact of fading and interference caused by the dynamic and often hostile environment in which UAVs operate.

Data Availability Statement:
The data that support the findings of this study are available from the corresponding author upon reasonable request.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: